Drowsiness on the road is a widespread problem with fatal consequences; thus, a multitude of systems and techniques have been proposed. Among existing methods, Ghoddoosian et al. utilized temporal blinking patterns to detect early signs of drowsiness, but their algorithm was tested only on a powerful desktop computer, which is not practical to apply in a moving vehicle setting. In this paper, we propose an efficient platform to run Ghoddosian's algorithm, detail the performance tests we ran to determine this platform, and explain our threshold optimization logic. After considering the Jetson Nano and Beelink (Mini PC), we concluded that the Mini PC is the most efficient and practical to run our embedded system in a vehicle. To determine this, we ran communication speed tests and evaluated total processing times for inference operations. Based on our experiments, the average total processing time to run the drowsiness detection model was 94.27 ms for Jetson Nano and 22.73 ms for the Beelink (Mini PC). Considering the portability and power efficiency of each device, along with the processing time results, the Beelink (Mini PC) was determined to be most suitable. Also, we propose a threshold optimization algorithm, which determines whether the driver is drowsy or alert based on the trade-off between the sensitivity and specificity of the drowsiness detection model. Our study will serve as a crucial next step for drowsiness detection research and its application in vehicles. Through our experiment, we have determinend a favorable platform that can run drowsiness detection algorithms in real-time and can be used as a foundation to further advance drowsiness detection research. In doing so, we have bridged the gap between an existing embedded system and its actual implementation in vehicles to bring drowsiness technology a step closer to prevalent real-life implementation.
translated by 谷歌翻译
We introduce TeSS (Text Similarity Comparison using Sentence Encoder), a framework for zero-shot classification where the assigned label is determined by the embedding similarity between the input text and each candidate label prompt. We leverage representations from sentence encoders optimized to locate semantically similar samples closer to each other in embedding space during pre-training. The label prompt embeddings serve as prototypes of their corresponding class clusters. Furthermore, to compensate for the potentially poorly descriptive labels in their original format, we retrieve semantically similar sentences from external corpora and additionally use them with the original label prompt (TeSS-R). TeSS outperforms strong baselines on various closed-set and open-set classification datasets under zero-shot setting, with further gains when combined with label prompt diversification through retrieval. These results are robustly attained to verbalizer variations, an ancillary benefit of using a bi-encoder. Altogether, our method serves as a reliable baseline for zero-shot classification and a simple interface to assess the quality of sentence encoders.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recent studies on semi-supervised semantic segmentation (SSS) have seen fast progress. Despite their promising performance, current state-of-the-art methods tend to increasingly complex designs at the cost of introducing more network components and additional training procedures. Differently, in this work, we follow a standard teacher-student framework and propose AugSeg, a simple and clean approach that focuses mainly on data perturbations to boost the SSS performance. We argue that various data augmentations should be adjusted to better adapt to the semi-supervised scenarios instead of directly applying these techniques from supervised learning. Specifically, we adopt a simplified intensity-based augmentation that selects a random number of data transformations with uniformly sampling distortion strengths from a continuous space. Based on the estimated confidence of the model on different unlabeled samples, we also randomly inject labelled information to augment the unlabeled samples in an adaptive manner. Without bells and whistles, our simple AugSeg can readily achieve new state-of-the-art performance on SSS benchmarks under different partition protocols.
translated by 谷歌翻译
Flow-guide synthesis provides a common framework for frame interpolation, where optical flow is typically estimated by a pyramid network, and then leveraged to guide a synthesis network to generate intermediate frames between input frames. In this paper, we present UPR-Net, a novel Unified Pyramid Recurrent Network for frame interpolation. Cast in a flexible pyramid framework, UPR-Net exploits lightweight recurrent modules for both bi-directional flow estimation and intermediate frame synthesis. At each pyramid level, it leverages estimated bi-directional flow to generate forward-warped representations for frame synthesis; across pyramid levels, it enables iterative refinement for both optical flow and intermediate frame. In particular, we show that our iterative synthesis can significantly improve the robustness of frame interpolation on large motion cases. Despite being extremely lightweight (1.7M parameters), UPR-Net achieves excellent performance on a large range of benchmarks. Code will be available soon.
translated by 谷歌翻译
When training early-stage deep neural networks (DNNs), generating intermediate features via convolution or linear layers occupied most of the execution time. Accordingly, extensive research has been done to reduce the computational burden of the convolution or linear layers. In recent mobile-friendly DNNs, however, the relative number of operations involved in processing these layers has significantly reduced. As a result, the proportion of the execution time of other layers, such as batch normalization layers, has increased. Thus, in this work, we conduct a detailed analysis of the batch normalization layer to efficiently reduce the runtime overhead in the batch normalization process. Backed up by the thorough analysis, we present an extremely efficient batch normalization, named LightNorm, and its associated hardware module. In more detail, we fuse three approximation techniques that are i) low bit-precision, ii) range batch normalization, and iii) block floating point. All these approximate techniques are carefully utilized not only to maintain the statistics of intermediate feature maps, but also to minimize the off-chip memory accesses. By using the proposed LightNorm hardware, we can achieve significant area and energy savings during the DNN training without hurting the training accuracy. This makes the proposed hardware a great candidate for the on-device training.
translated by 谷歌翻译
We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.
translated by 谷歌翻译
在半导体制造中,晶圆地图缺陷模式为设施维护和产量管理提供了关键信息,因此缺陷模式的分类是制造过程中最重要的任务之一。在本文中,我们提出了一种新颖的方式来表示缺陷模式作为有限维矢量的形状,该矢量将用作分类神经网络算法的输入。主要思想是使用拓扑数据分析(TDA)的持续同源性理论提取每种模式的拓扑特征。通过使用模拟数据集进行的一些实验,我们表明,与使用卷积神经网络(CNN)的方法相比,所提出的方法在训练方面更快,更有效地训练,这是晶圆映射缺陷模式分类的最常见方法。此外,当训练数据的数量不够并且不平衡时,我们的方法优于基于CNN的方法。
translated by 谷歌翻译
热分析在不同的温度场景下提供了对电子芯片行为的更深入见解,并可以更快地设计探索。但是,使用FEM或CFD,在芯片上获得详细而准确的热曲线非常耗时。因此,迫切需要加快片上热溶液以解决各种系统方案。在本文中,我们提出了一个热机学习(ML)求解器,以加快芯片的热模拟。热ML-Solver是最近的新型方法CoAemlSim(可组合自动编码器的机器学习模拟器)的扩展,并对溶液算法进行了修改,以处理常数和分布式HTC。在不同情况下,针对商业求解器(例如ANSYS MAPDL)以及最新的ML基线UNET验证了所提出的方法,以证明其增强的准确性,可伸缩性和概括性。
translated by 谷歌翻译
语义上有意义的句子嵌入对于自然语言处理中的许多任务都很重要。为了获得此类嵌入,最近的研究探讨了利用验证语言模型(PLM)作为训练语料库的合成生成数据的想法。但是,PLM通常会产生与人类写的句子大不相同的句子。我们假设将所有这些合成示例同样地用于训练深层神经网络可能会对学习语义上有意义的嵌入产生不利影响。为了分析这一点,我们首先训练一个分类器来识别机器编写的句子,并观察到机器编写的句子的语言特征与人写的句子的语言特征大不相同。基于此,我们提出了一种新颖的方法,该方法首先训练分类器来衡量每个句子的重要性。然后,分类器的蒸馏信息用于训练可靠的句子嵌入模型。通过对四个现实世界数据集的广泛评估,我们证明了我们的合成数据训练的模型可以很好地概括并表现优于现有基线。我们的实现可在https://github.com/ddehun/coling2022_reweighting_sts上公开获得。
translated by 谷歌翻译